---
title: "Tips and Tricks"
permalink: /docs/tips-and-tricks/
excerpt: "Various tips and tricks applicable to most tasks."
last_modified_at: 2020/12/08 15:39:29
toc: true
---

This section contains various tips and tricks applicable to most tasks in the library.

## Visualization support

The [Weights & Biases](https://www.wandb.com/) framework is supported for visualizing model training.

To use this, simply set a project name for W&B in the `wandb_project` attribute of the `args` dictionary. This will log all hyperparameter values, training losses, and evaluation metrics to the given project.

```python
model = ClassificationModel('roberta', 'roberta-base', args={'wandb_project': 'project-name'})
```

For a complete example, see [here](https://medium.com/skilai/to-see-is-to-believe-visualizing-the-training-of-machine-learning-models-664ef3fe4f49).


## Using early stopping

Early stopping is a technique used to prevent model overfitting. In a nutshell, the idea is to periodically evaluate the performance of a model against a test dataset and terminate the training once the model stops improving on the test data.

The exact conditions for early stopping can be adjusted as needed using a model's configuration options.

**Note:** Refer the configuration options table for more details. (`early_stopping_consider_epochs`, `early_stopping_delta`, `early_stopping_metric`, `early_stopping_metric_minimize`, `early_stopping_patience`)
{: .notice--info}

You must set `use_early_stopping` to `True` in order to use early stopping.

```python
from simpletransformers.classification import ClassificationModel, ClassificationArgs


model_args = ClassificationArgs()
model_args.use_early_stopping = True
model_args.early_stopping_delta = 0.01
model_args.early_stopping_metric = "mcc"
model_args.early_stopping_metric_minimize = False
model_args.early_stopping_patience = 5
model_args.evaluate_during_training_steps = 1000

model = ClassficationModel("bert", "bert-base-cased", args=model_args)
```

With this configuration, the training will terminate if the `mcc` score of the model on the test data does not improve upon the best `mcc` score by at least `0.01` for 5 consecutive evaluations. An evaluation will occur once for every `1000` training steps.

**Pro tip:** You can use the evaluation during training functionality without invoking early stopping by setting `evaluate_during_training` to `True` while keeping `use_early_stopping` as `False`.
{: .notice--success}


## Additional Evaluation Metrics

Task-specific Simple Transformers models each have their own default metrics that will be calculated when a model is evaluated
on a dataset. The default metrics have been chosen according to the task, usually by looking at the metrics used in standard benchmarks for that task.

However, it is likely that you will wish to calculate your own metrics depending on your particular use case. To facilitate this, all `eval_model()` and `train_model()` methods in Simple Transformers accepts keyword-arguments consisting of the name of the metric (str), and the metric function itself. The metric function should accept two inputs, the true labels and the model predictions (sklearn format).

```python
from simpletransformers.classification import ClassificationModel
import sklearn


model = ClassficationModel("bert", "bert-base-cased")

model.train_model(train_df, acc=sklearn.metrics.accuracy_score)

model.eval_model(eval_df, acc=sklearn.metrics.accuracy_score)
```

**Pro tip:** You can combine the additional evaluation metrics functionality with early stopping by setting the name of your metrics function as the `early_stopping_metric`.
{: .notice--success}


## Simple-Viewer (Visualizing Model Predictions with Streamlit)

Simple Viewer is a web-app built with the [Streamlit](https://www.streamlit.io/) framework which can be used to quickly try out trained models.

To start Simple Viewer, run the command `simple-viewer`.

When Simple Viewer is started, it will look for Simple Transfomers models in the current directory and any subdirectories. All detected models can be found in the `Choose Model` dropdown. Alternatively, you can load a model by specifying the Simple Transformers task, model type, and model name (model type and model name follows the usual Simple Transformers conventions). The model name may be the path to a local model, or it may be the model name for a model from the Hugging Face model [hub](https://huggingface.co/models).

The following Simple Transformers tasks are currently supported:

- Classification
- Multi-Label Classification
- Named Entity Recognition
- Question Answering

## Hyperparameter Optimization

Machine learning models can be very sensitive to the hyperparameters used to train them. While large models like Transformers can perform well across a relatively wider hyperparameter range, they can also break completely under certain conditions (like training with large learning rates for many iterations).

**Hint:** We can define two kinds of parameters used to train Transformer models. The first is the learned parameters (like the model weights) and the second is hyperparameters. To give a high-level description of the two kinds of parameters, the hyperparameters (learning rate, batch sizes, etc.) are used to control the process of *learning* learned parameters.
{: .notice--success}

Choosing a good set of hyperparameter values plays a huge role in developing a state-of-the-art model. Because of this, Simple Transformers has native support for the excellent [W&B Sweeps](https://docs.wandb.com/sweeps) feature for automated hyperparameter optimization.

How to perform hyperparameter optimization with Simple Transformers and W&B Sweeps (Adapted from W&B [docs](https://docs.wandb.com/sweeps)):

### 1. Setup the sweep

The sweep can be configured through a Python dictionary (`sweep_config`). The dictionary contains at least 3 keys;

1. `method` -- Specifies the search strategy

    | `method` | Meaning                                                                                                                                                                                      |
    | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | grid     | Grid search iterates over all possible combinations of parameter values.                                                                                                                     |
    | random   | Random search chooses random sets of values.                                                                                                                                                 |
    | bayes    | Bayesian Optimization uses a gaussian process to model the function and then chooses parameters to optimize probability of improvement. This strategy requires a metric key to be specified. |

2. `metric` -- Specifies the metric to be optimized

    *This should be a metric that is logged to W&B by the training script*

    The `metric` key of the `sweep_config` points to another Python dictionary containing the `name`, `goal`, and (optionally) `target`.

    | sub-key | Meaning                                                                                                                                                                                                                                                                             |
    | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | name    | Name of the metric to optimize                                                                                                                                                                                                                                                      |
    | goal    | `"minimize"` or `"maximize"` (Default is `"minimize"`)                                                                                                                                                                                                                              |
    | target  | Value that you'd like to achieve for the metric you're optimizing. When any run in the sweep achieves that target value, the sweep's state will be set to "Finished." This means all agents with active runs will finish those jobs, but no new runs will be launched in the sweep. |

3. `parameters` -- Specifies the hyperparameters and their values to explore

    The `parameters` key of the `sweep_config` points to another Python dictionary which contains all the hyperparameters to be optimized and their possible values. Generally, these will be any combination of the `model_args` for the particular Simple Transformers model.

    W&B offers a variety of ways to define the possible values for each parameter, all of which can be found in the [W&B docs](https://docs.wandb.com/sweeps/configuration#parameters). The possible values are also represented using a Python dictionary. Two common methods are given below.

    1. Discrete values

        A dictionary with the key `values` pointing to a Python list of discrete values.

    2. Range of values

        A dictionary with the two keys `min` and `max` which specifies the minimum and maximum values of the range. *The range is continuous if `min` and `max` are floats and discrete if `min` and `max` are ints.*

Example `sweep_config`:

```python
sweep_config = {
    "method": "bayes",  # grid, random
    "metric": {"name": "train_loss", "goal": "minimize"},
    "parameters": {
        "num_train_epochs": {"values": [2, 3, 5]},
        "learning_rate": {"min": 5e-5, "max": 4e-4},
    },
}
```

### 2. Initialize the sweep

Initialize a W&B sweep with the config defined earlier.

```python
sweep_id = wandb.sweep(sweep_config, project="Simple Sweep")
```

### 3. Prepare the data and default model configuration

In order to run our sweep, we must get our data ready. This is identical to how you would normally set up datasets for training a Simple Transformers model.

For example;

```python
# Preparing train data
train_data = [
    ["Aragorn was the heir of Isildur", "true"],
    ["Frodo was the heir of Isildur", "false"],
]
train_df = pd.DataFrame(train_data)
train_df.columns = ["text", "labels"]

# Preparing eval data
eval_data = [
    ["Theoden was the king of Rohan", "true"],
    ["Merry was the king of Rohan", "false"],
]
eval_df = pd.DataFrame(eval_data)
eval_df.columns = ["text", "labels"]
```

Next, we can set up the default configuration for the Simple Transformers model. This would include any `args` that are not being optimized through the sweep.

**Hint:** As a rule of thumb, it might be a good idea to set all of `reprocess_input_data`, `overwrite_output_dir`, and `no_save` to `True` when running sweeps.
{: .notice--success}

```python
model_args = ClassificationArgs()
model_args.reprocess_input_data = True
model_args.overwrite_output_dir = True
model_args.evaluate_during_training = True
model_args.manual_seed = 4
model_args.use_multiprocessing = True
model_args.train_batch_size = 16
model_args.eval_batch_size = 8
model_args.labels_list = ["true", "false"]
model_args.wandb_project = "Simple Sweep"
```

### 4. Set up the training function

W&B will call this function to run the training for a particular sweep run. This function must perform 3 critical tasks.

1. Initialize the `wandb` run
2. Initialize a Simple Transformers model and pass in `sweep_config=wandb.config` as a `kwarg`.
3. Run the training for the Simple Transformers model.

*`wandb.config` contains the hyperparameter values for the current sweeps run. Simple Transformers will update the model `args` accordingly.*

An example training function is shown below.

```python
def train():
    # Initialize a new wandb run
    wandb.init()

    # Create a TransformerModel
    model = ClassificationModel(
        "roberta",
        "roberta-base",
        use_cuda=True,
        args=model_args,
        sweep_config=wandb.config,
    )

    # Train the model
    model.train_model(train_df, eval_df=eval_df)

    # Evaluate the model
    model.eval_model(eval_df)

    # Sync wandb
    wandb.join()

```

In addition to the 3 tasks outlined earlier, the function also performs an evaluation and manually syncs the W&B run.

**Hint:** This function can be reused across any Simple Transformers task by simply replacing `ClassificationModel` with the appropriate model class.
{: .notice--success}


### 5. Run the sweeps

The following line will execute the sweeps.

```python
wandb.agent(sweep_id, train)
```

### 6. Putting it all together

```python
import logging

import pandas as pd
import sklearn

import wandb
from simpletransformers.classification import (
    ClassificationArgs,
    ClassificationModel,
)

sweep_config = {
    "method": "bayes",  # grid, random
    "metric": {"name": "train_loss", "goal": "minimize"},
    "parameters": {
        "num_train_epochs": {"values": [2, 3, 5]},
        "learning_rate": {"min": 5e-5, "max": 4e-4},
    },
}

sweep_id = wandb.sweep(sweep_config, project="Simple Sweep")

logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

# Preparing train data
train_data = [
    ["Aragorn was the heir of Isildur", "true"],
    ["Frodo was the heir of Isildur", "false"],
]
train_df = pd.DataFrame(train_data)
train_df.columns = ["text", "labels"]

# Preparing eval data
eval_data = [
    ["Theoden was the king of Rohan", "true"],
    ["Merry was the king of Rohan", "false"],
]
eval_df = pd.DataFrame(eval_data)
eval_df.columns = ["text", "labels"]

model_args = ClassificationArgs()
model_args.reprocess_input_data = True
model_args.overwrite_output_dir = True
model_args.evaluate_during_training = True
model_args.manual_seed = 4
model_args.use_multiprocessing = True
model_args.train_batch_size = 16
model_args.eval_batch_size = 8
model_args.labels_list = ["true", "false"]
model_args.wandb_project = "Simple Sweep"

def train():
    # Initialize a new wandb run
    wandb.init()

    # Create a TransformerModel
    model = ClassificationModel(
        "roberta",
        "roberta-base",
        use_cuda=True,
        args=model_args,
        sweep_config=wandb.config,
    )

    # Train the model
    model.train_model(train_df, eval_df=eval_df)

    # Evaluate the model
    model.eval_model(eval_df)

    # Sync wandb
    wandb.join()


wandb.agent(sweep_id, train)

```

**Hint:** This script can also be found in the `examples` directory of the Github repo.
{: .notice--success}

To visualize your sweep results, open the project on W&B. Please refer to [W&B docs](https://docs.wandb.com/sweeps/visualize-sweep-results) for more details on understanding the results.

**Guide:** Guide for hyperparameter optimization [here](https://towardsdatascience.com/hyperparameter-optimization-for-optimum-transformer-models-b95a32b70949?source=friends_link&sk=7d19ce15c9ac1230642d826b9deeb638).
{: .notice--success}

## Custom Parameter Groups (Freezing Layers)

Simple Transformers supports custom parameter groups which can be used to set different learning rates for different layers in a model, freeze layers, train only the final layer, etc.

All Simple Transformers models supports the following three configuration options for setting up custom parameter groups.

### Custom parameter groups

`custom_parameter_groups` offers the most granular configuration option. This should be a list of Python dicts where each dict contains a `params` key and any other optional keys matching the keyword arguments accepted by the optimizer (e.g. `lr`, `weight_decay`). The value for the `params` key should be a list of named parameters (e.g. `["classifier.weight", "bert.encoder.layer.10.output.dense.weight"]`)

**Hint:** All Simple Transformers models have a `get_named_parameters()` method that returns a list of all parameter names in the model.
{: .notice--success}

```python
model_args = ClassificationArgs()
model_args.custom_parameter_groups = [
    {
        "params": ["classifier.weight", "bert.encoder.layer.10.output.dense.weight"],
        "lr": 1e-2,
    }
]
```

### Custom layer parameters

`custom_layer_parameters` makes it more convenient to set the optimizer options for a given layer or set of layers. This should be a list of Python dicts where each dict contains a `layer` key and any other optional keys matching the keyword arguments accepted by the optimizer (e.g. `lr`, `weight_decay`). The value for the `layer` key should be an `int` (must be numeric) which specifies the layer (e.g. `0`, `1`, `11`).

```python
model_args = ClassificationArgs()
model_args.custom_layer_parameters = [
    {
        "layer": 10,
        "lr": 1e-3,
    },
    {
        "layer": 0,
        "lr": 1e-5,
    },
]
```

**Note:** Any named parameters specified through `custom_layer_parameters` with `bias` or `LayerNorm.weight` in the name will have their `weight_decay` set to `0.0`. This also happens for any parameters **not specified** in either `custom_parameter_groups` or in `custom_layer_parameters` but **does not happen** for parameters specified through `custom_parameter_groups`.
{: .notice--info}

{% capture notice-text %}

Note that `custom_parameter_groups` has *higher priority* than `custom_layer_parameters` as `custom_parameter_groups` is more specific. If a parameter specificed in `custom_parameter_groups` also happens to be in a layer specified in `custom_layer_parameters`, that particular parameter will be assigned to the parameter group specified in `custom_parameter_groups`.

For example:

```python
model_args = ClassificationArgs()
model_args.custom_layer_parameters = [
    {
        "layer": 10,
        "lr": 1e-3,
    },
    {
        "layer": 0,
        "lr": 1e-5,
    },
]
model_args.custom_parameter_groups = [
    {
        "params": ["classifier.weight", "bert.encoder.layer.10.output.dense.weight"],
        "lr": 1e-2,
    }
]
```

Here, `"bert.encoder.layer.10.output.dense.weight"` is specified in both the `custom_parameter_groups` and the `custom_layer_parameters`. However, `"bert.encoder.layer.10.output.dense.weight"` will have a `lr` of `1e-2` due to the higher precedence of `custom_parameter_groups`.

{% endcapture %}

<div class="notice--success">
  <p><strong>Order of precedence:</strong></p>
  {{ notice-text | markdownify }}
</div>

**Hint:** Any parameters not specified in either `custom_parameter_groups` or in `custom_layer_parameters` will be assigned the general values from the model args.
{: .notice--success}

### Train custom parameters only

The `train_custom_parameters_only` option is used to facilitate the training of specific parameters only. If `train_custom_parameters_only` is set to `True`, only the parameters specified in either `custom_parameter_groups` or in `custom_layer_parameters` will be trained.

For example, to train only the Classification layers of a `ClassificationModel`:

```python
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import pandas as pd
import logging


logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

# Preparing train data
train_data = [
    ["Aragorn was the heir of Isildur", 1],
    ["Frodo was the heir of Isildur", 0],
]
train_df = pd.DataFrame(train_data)
train_df.columns = ["text", "labels"]

# Preparing eval data
eval_data = [
    ["Theoden was the king of Rohan", 1],
    ["Merry was the king of Rohan", 0],
]
eval_df = pd.DataFrame(eval_data)
eval_df.columns = ["text", "labels"]

# Train only the classifier layers
model_args = ClassificationArgs()
model_args.train_custom_parameters_only = True
model_args.custom_parameter_groups = [
    {
        "params": ["classifier.weight"],
        "lr": 1e-3,
    },
    {
        "params": ["classifier.bias"],
        "lr": 1e-3,
        "weight_decay": 0.0,
    },
]
# Create a ClassificationModel
model = ClassificationModel(
    "bert", "bert-base-cased", args=model_args
)

# Train the model
model.train_model(train_df)

```

## Options For Downloading Pre-Trained Models

Most Simple Transformers models will use the `from_pretrained()` [method](https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained) from the Hugging Face Transformers library to download pre-trained models. You can pass `kwargs` to this method to configure things like proxies and force downloading (refer to method link above).

You can pass these `kwargs` when initializing a Simple Transformers task-specific model to access the same functionality. For example, if you are behind a firewall and need to set the proxy settings;

```python
model = ClassficationModel(
    "bert",
    "bert-base-cased",
    proxies={"http": "foo.bar:3128", "http://hostname": "foo.bar:4012"}
)
```

## ONNX Support (Beta)

Simple Transformers has ONNX support for Classification and NER tasks. These models can be converted to an ONNX model and run through the ONNX-runtime.

**Heads up:** ONNX support should be considered experimental at this time. If you encounter any problems, please open an issue in the repo. Please provide a detailed explanation and the **minimal** code necessary to replicate the issue.
{: .notice--warning}

### ONNX setup

Please refer to the following pages for instructions on installing ONNX.

- [ONNX](https://github.com/onnx/onnx#installation)
- [ONNX Runtime](https://github.com/microsoft/onnxruntime#get-started)

### Converting a Simple Transformers model to the ONNX format.

The following models are currently compatible:

- ClassificationModel
- NERModel

These models can be converted by calling the `convert_to_onnx()` method. You can change the output directory by specifying `output_dir` when calling this method.

```python
from simpletransformers.classification import (
    ClassificationModel,
    ClassificationArgs,
)


model = ClassificationModel(
    "roberta",
    "roberta-base",
)

model.convert_to_onnx("onnx_outputs")

```

### Loading a converted ONNX model

You can load the ONNX model just as you would load any other model in Simple Transformers.

```python
from simpletransformers.classification import (
    ClassificationModel,
    ClassificationArgs,
)


model = ClassificationModel(
    "roberta",
    "onnx_outputs",
)

model.convert_to_onnx("onnx_outputs")

```

After the model is loaded, you can use the `predict()` method to make predictions.

### Code example

```python
from time import time

from simpletransformers.classification import (
    ClassificationModel,
    ClassificationArgs,
)


model_args = ClassificationArgs()
model_args.overwrite_output_dir = True


# Create a TransformerModel
model = ClassificationModel(
    "roberta",
    "roberta-base",
    use_cuda=False,
    args=model_args,
)

start = time()
print(model.predict(["test " * 450]))
end = time()
print(f"Pytorch CPU: {end - start}")

model.convert_to_onnx("onnx_outputs")

model_args.dynamic_quantize = True

model = ClassificationModel(
    "roberta",
    "onnx_outputs",
    args=model_args,
)

start = time()
print(model.predict(["test " * 450]))
end = time()
print(f"ONNX CPU (Cold): {end - start}")

start = time()
print(model.predict(["test " * 450]))
end = time()
print(f"ONNX CPU (Warm): {end - start}")

```

### Execution Providers

ONNX-Runtime supports many different [Execution Providers](https://github.com/microsoft/onnxruntime/tree/master/docs/execution_providers).

If `use_cuda` is `True`, `CUDAExecutionProvider` will be used. If it is `False`, the `CPUExecutionProvider` will be used.

You can manually specify the provider using the `onnx_execution_provider` argument when loading a model.

```python
model = ClassificationModel(
    "roberta",
    "onnx_outputs",
    args=model_args,
    onnx_execution_provider="CPUExecutionProvider",
)

```

*Note that the library is only tested with CPU and CUDA Execution Providers*


## Saving checkpoints

### Don't save model checkpoints

When training takes little time we may want to save no intermediary checkpoints to reduce disk space usage and training time.   
Note that the model artifacts will still be saved to `output_dir` when the training process finishes.  
We can prevent the model from saving intermediary checkpoints by setting the following arguments: set `save_steps` to `-1` and  `save_model_every_epoch` to `False`



```python
from simpletransformers.classification import ClassificationModel, ClassificationArgs


model_args = ClassificationArgs()
model_args["save_steps"] = -1
model_args["save_model_every_epoch"] = False
model = ClassficationModel("bert", "bert-base-cased", args=model_args)
```

### Save model checkpoint every 3 epochs

Every model checkpoint takes the same disk space as a final model, When training transformer models for a high number of epochs we may not want to save checkpoints for every single epoch since this would take a lot of disk space. In the following example
You will see how to save a checkpoint every 3 epochs.

The procedure just requires two steps:
- Turn off automatic save after every epoch by setting `save_model_every_epoch` arg to `False`
- `save_steps` must be set to N(save every N epochs) times the number of steps the model will perform for every epoch


```python
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import math
SAVE_EVERY_N_EPOCHS = 3
model_args = ClassificationArgs()
steps_per_epoch = math.floor(len(train_df) / SAVE_EVERY_N_EPOCHS)
if(len(train_df) % SAVE_EVERY_N_EPOCHS > 0):
    steps_per_epoch +=1
model_args["save_steps"] = steps_per_epoch * SAVE_EVERY_N_EPOCHS
model_args["save_model_every_epoch"] = False
model = ClassficationModel("bert", "bert-base-cased", args=model_args)
```
